sf packageIn this lesson we’ll learn about sf, the package that is core to using geospatial data in R. We’ll go through the structure of the data (it’s not too different from regular data.frames!), geometries, shapefiles, and how to save your hard work.
sf package?sf objectsf objectsf object
Instructor Notes
sf?sf = simple featuressf creates geospatial data.frames that retain all of the functionality of R data.frames, but which are extended with a geometry column and with geospatial metadata, making it easy to process your data using both standard table-based operations and explicitly geospatial operations.
sfLet’s start by loading the sf library.
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
As we discussed in the initial geospatial overview, a shapefile is one type of geospatial data that holds vector data.
To learn more about ESRI Shapefiles, this is a good place to start: ESRI Shapefile Wiki Page
The tricky thing to remember about shapefiles is that they’re actually a collection of 3 to 9+ files together. Here’s a list of all the files that can make up a shapefile:
shp: The main file that stores the feature geometry
shx: The index file that stores the index of the feature geometry
dbf: The dBASE table that stores the attribute information of features
prj: The file that stores the coordinate system information. (should be required!)
xml: Metadata —Stores information about the shapefile.
cpg: Specifies the code page for identifying the character set to be used.
But it remains the most commonly used file format for vector spatial data, and it’s really easy to visualize in one go!
Let’s try it out with California counties, and use sf for the first time. sf::st_read is a flexible function that let’s you read in many different types of geospatial data.
# Read in the counties shapefile
counties = st_read('notebook_data/california_counties/CaliforniaCounties.shp')
## Reading layer `CaliforniaCounties' from data source `/Users/hikarimurayama/Documents/repos/Geospatial-Fundamentals-in-R-with-sf/notebook_data/california_counties/CaliforniaCounties.shp' using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 58 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -374445.4 ymin: -604500.7 xmax: 540038.5 ymax: 450022
## projected CRS: NAD83 / California Albers
# Plot out California counties
plot(counties)
## Warning: plotting the first 9 out of 58 attributes; use max.plot = 58 to plot
## all
Wow! That gives us a plot grid of up to the first 9 attributes (i.e. columns) in our dataset.
And what if we just want to plot a single variable?
plot(counties['MED_AGE_M'])
Wow! So easy! We just made a choropleth map of median male age, by county!
We’re off to a running start.
sf data.frameBefore we get in too deep, let’s discuss what a sf data.frame is, and how it’s different from a plain data.frame.
sf data.frameAn sf data.frame, a.k.a. an sf object, is just like a plain data.frame, but with an extra geometry column. The sf package then has a variety of geospatial functions that use that geometry column.
I repeat because it’s important:
An sf object is a plain data.frame with a geometry column added.
This means all the normal operations that we can run on
data.frames will also work onsfobjects!
With that in mind, let’s start exploring our sf object just like we would a dataset in a plain data.frame.
# Find the number of rows and columnds in counties
dim(counties)
## [1] 58 59
# Look at the first couple of rows in our sf object
head(counties)
## Simple feature collection with 6 features and 58 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6
## projected CRS: NAD83 / California Albers
## FID_ NAME STATE_NAME POP2010 POP10_SQMI POP2012 POP12_SQMI WHITE
## 1 0 Kern California 839631 102.9 851089 104.282870 499766
## 2 0 Kings California 152982 109.9 155039 111.427421 83027
## 3 0 Lake California 64665 48.6 65253 49.082334 52033
## 4 0 Lassen California 34895 7.4 35039 7.422856 25532
## 5 0 Los Angeles California 9818605 2402.3 9904341 2423.264150 4936599
## 6 0 Madera California 150865 70.1 153025 71.065672 94456
## BLACK AMERI_ES ASIAN HAWN_PI HISPANIC OTHER MULT_RACE MALES FEMALES
## 1 48921 12676 34846 1252 413033 204314 37856 433108 406523
## 2 11014 2562 5620 271 77866 42996 7492 86344 66638
## 3 1232 2049 724 108 11088 5455 3064 32469 32196
## 4 2834 1234 356 165 6117 3562 1212 22416 12479
## 5 856874 72828 1346865 26094 4687889 2140632 438713 4839654 4978951
## 6 5629 4136 2802 162 80992 37380 6300 72682 78183
## AGE_UNDER5 AGE_5_9 AGE_10_14 AGE_15_19 AGE_20_24 AGE_25_34 AGE_35_44
## 1 72885 68694 68473 72493 65339 122046 108500
## 2 12877 11564 11324 11356 13158 25589 21878
## 3 3633 3574 3888 4190 3362 6603 7095
## 4 1625 1595 1853 2107 2831 6337 5513
## 5 645793 633690 678845 753630 752788 1475731 1430326
## 6 11983 11756 11755 12224 11032 20562 19167
## AGE_45_54 AGE_55_64 AGE_65_74 AGE_75_84 AGE_85_UP MED_AGE MED_AGE_M MED_AGE_F
## 1 108479 77285 43502 23473 8462 30.7 30.2 31.4
## 2 20282 12924 6844 3809 1377 31.1 31.9 30.0
## 3 10255 10625 6553 3502 1385 45.0 43.8 45.9
## 4 5447 4113 1984 1041 449 37.0 35.5 40.9
## 5 1368947 1013156 568470 345603 151626 34.8 33.6 35.9
## 6 19291 15833 9868 5468 1926 33.1 31.5 34.3
## HOUSEHOLDS AVE_HH_SZ HSEHLD_1_M HSEHLD_1_F MARHH_CHD MARHH_NO_C MHH_CHILD
## 1 254610 3.15 23580 25629 74254 58472 12615
## 2 41233 3.19 3313 3884 12993 9299 2115
## 3 26548 2.39 3913 3970 4061 7381 926
## 4 10058 2.50 1348 1231 2068 3094 413
## 5 3241204 2.98 360530 424398 790374 690291 115984
## 6 43317 3.28 3342 3909 12660 12558 2056
## FHH_CHILD FAMILIES AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC
## 1 28989 191739 3.61 284367 29757 152828 101782
## 2 4818 31939 3.59 43867 2634 22329 18904
## 3 2051 16255 2.94 35492 8944 17472 9076
## 4 710 6800 2.98 12710 2652 6590 3468
## 5 296976 2194080 3.58 3445076 203872 1544749 1696455
## 6 4020 34093 3.63 49140 5823 27726 15591
## NO_FARMS07 AVG_SIZE07 CROP_ACR07 AVG_SALE07 SQMI CountyFIPS
## 1 2117 1116 942827 1513.53 8161.35 06103
## 2 1129 603 512870 1203.20 1391.39 06089
## 3 845 147 28997 72.31 1329.46 06106
## 4 459 1000 82567 120.92 4720.42 06086
## 5 1734 63 49158 187.94 4087.19 06073
## 6 1708 398 290683 579.70 2153.29 06102
## NEIGHBORS PopNeigh NEIGHBOR_1 PopNeigh_1 NEIGHBOR_2
## 1 San Bernardino,Tulare,Inyo 2495935 <NA> NA <NA>
## 2 Fresno,Kern,Tulare 2212260 <NA> NA <NA>
## 3 <NA> 0 <NA> NA <NA>
## 4 <NA> 0 <NA> NA <NA>
## 5 San Bernardino,Kern 2874841 <NA> NA <NA>
## 6 Mono,Fresno 944652 <NA> NA <NA>
## PopNeigh_2 geometry
## 1 NA MULTIPOLYGON (((193446 -244...
## 2 NA MULTIPOLYGON (((12524.03 -1...
## 3 NA MULTIPOLYGON (((-240632.1 9...
## 4 NA MULTIPOLYGON (((-45364.03 3...
## 5 NA MULTIPOLYGON (((173874.5 -4...
## 6 NA MULTIPOLYGON (((16681.21 -1...
# Look at all the variables included in our data
colnames(counties)
## [1] "FID_" "NAME" "STATE_NAME" "POP2010" "POP10_SQMI"
## [6] "POP2012" "POP12_SQMI" "WHITE" "BLACK" "AMERI_ES"
## [11] "ASIAN" "HAWN_PI" "HISPANIC" "OTHER" "MULT_RACE"
## [16] "MALES" "FEMALES" "AGE_UNDER5" "AGE_5_9" "AGE_10_14"
## [21] "AGE_15_19" "AGE_20_24" "AGE_25_34" "AGE_35_44" "AGE_45_54"
## [26] "AGE_55_64" "AGE_65_74" "AGE_75_84" "AGE_85_UP" "MED_AGE"
## [31] "MED_AGE_M" "MED_AGE_F" "HOUSEHOLDS" "AVE_HH_SZ" "HSEHLD_1_M"
## [36] "HSEHLD_1_F" "MARHH_CHD" "MARHH_NO_C" "MHH_CHILD" "FHH_CHILD"
## [41] "FAMILIES" "AVE_FAM_SZ" "HSE_UNITS" "VACANT" "OWNER_OCC"
## [46] "RENTER_OCC" "NO_FARMS07" "AVG_SIZE07" "CROP_ACR07" "AVG_SALE07"
## [51] "SQMI" "CountyFIPS" "NEIGHBORS" "PopNeigh" "NEIGHBOR_1"
## [56] "PopNeigh_1" "NEIGHBOR_2" "PopNeigh_2" "geometry"
It looks like we have a good amount of information about the total population for different years and the densities, as well as race, age, and occupancy info.
sf objectsWe’re able to map our sf object because of the extra geometry column.
sf GeometriesThere are three main types of geometries that can be associated with your sf object: points, lines and polygons:
In an sf data.frame these geometries are encoded in a format known as Well-Known Text (WKT). For example:
- POINT (30 10)
- LINESTRING (30 10, 10 30, 40 40)
- POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
where coordinates are separated by a space and coordinate pairs by a comma
Your sf object may also include the variants multipoints, multilines, and multipolgyons if some of the features are composed multiple parts. For example, if we had data representing US states (one per row), we could use a POLYGON geometry for states like Utah or Colorado, but would need a MULTIPOLYGON for states like Hawaii, which includes many islands.
Question What kind of geometry would a roads
sf object have? What about one that includes landmarks in the San Francisco Bay Area?
Just like with other plots we can make in R, we can start customizing our maps’ colors, title, etc.
The most rudimentary way to do this would be to use base R’s plot function to plot our sf objects geometries.
# Plot our geometries, coloring them pale, with dark green borders
plot(counties$geometry, col='tan', border='darkgreen', main="CA counties")
However, we’ll get much more customizability if we use a special-purpose mapping package, rather than just relying on sf methods of base R functions.
Our go-to mapping package of choice will be tmap. Its name stands for “thematic maps”, i.e. maps in which we use dimensions of our dataset to control the visualization parameters of our maps, thus creating effective data visualizations.
You’ll get plenty of introduction here in the workshop, but for additional support you can check out the tmap vignette or Google other tutorials and references.
Let’s start by loading the package and creating a ‘quick tmap’.
# load tmap
library(tmap)
# plot a 'quick tmap'
qtm(counties)
Nice!
That’s the quickest, simplest example of a static map that tmap can make. However, tmap has 2 modes: - ‘plot’ mode: static maps - ‘view’ mode: interactive maps
tmap loads up in ‘plot’ mode. Let’s switch it to ‘view’ mode and then take a look at that same map.
We could either set the mode explicitly (tmap_mode('view')) or just toggle back and forth using the ttm (‘toggle tmap mode’) function. Let’s use the latter.
# toggle the mode
ttm()
## tmap mode set to interactive viewing
# then make our quick tmap again
qtm(counties)